Overview

Dataset statistics

Number of variables16
Number of observations891
Missing cells869
Missing cells (%)6.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory99.3 KiB
Average record size in memory114.1 B

Variable types

Numeric7
Categorical6
Boolean3

Alerts

survived is highly correlated with adult_maleHigh correlation
pclass is highly correlated with fareHigh correlation
sibsp is highly correlated with aloneHigh correlation
parch is highly correlated with aloneHigh correlation
fare is highly correlated with pclass and 1 other fieldsHigh correlation
adult_male is highly correlated with survivedHigh correlation
alone is highly correlated with sibsp and 2 other fieldsHigh correlation
survived is highly correlated with adult_maleHigh correlation
pclass is highly correlated with fareHigh correlation
sibsp is highly correlated with aloneHigh correlation
parch is highly correlated with aloneHigh correlation
fare is highly correlated with pclassHigh correlation
adult_male is highly correlated with survivedHigh correlation
alone is highly correlated with sibsp and 1 other fieldsHigh correlation
survived is highly correlated with adult_maleHigh correlation
pclass is highly correlated with fareHigh correlation
sibsp is highly correlated with aloneHigh correlation
parch is highly correlated with aloneHigh correlation
fare is highly correlated with pclassHigh correlation
adult_male is highly correlated with survivedHigh correlation
alone is highly correlated with sibsp and 1 other fieldsHigh correlation
embark_town is highly correlated with embarkedHigh correlation
sex is highly correlated with who and 2 other fieldsHigh correlation
who is highly correlated with sex and 2 other fieldsHigh correlation
alive is highly correlated with sex and 2 other fieldsHigh correlation
adult_male is highly correlated with sex and 2 other fieldsHigh correlation
class is highly correlated with deckHigh correlation
embarked is highly correlated with embark_townHigh correlation
deck is highly correlated with classHigh correlation
survived is highly correlated with sex and 2 other fieldsHigh correlation
pclass is highly correlated with fare and 4 other fieldsHigh correlation
sex is highly correlated with survived and 3 other fieldsHigh correlation
age is highly correlated with who and 1 other fieldsHigh correlation
sibsp is highly correlated with parch and 1 other fieldsHigh correlation
parch is highly correlated with sibsp and 1 other fieldsHigh correlation
fare is highly correlated with pclass and 1 other fieldsHigh correlation
embarked is highly correlated with pclass and 2 other fieldsHigh correlation
class is highly correlated with pclass and 4 other fieldsHigh correlation
who is highly correlated with sex and 2 other fieldsHigh correlation
adult_male is highly correlated with survived and 5 other fieldsHigh correlation
deck is highly correlated with pclass and 1 other fieldsHigh correlation
embark_town is highly correlated with pclass and 2 other fieldsHigh correlation
alive is highly correlated with survived and 2 other fieldsHigh correlation
alone is highly correlated with sibsp and 2 other fieldsHigh correlation
age has 177 (19.9%) missing values Missing
deck has 688 (77.2%) missing values Missing
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
survived has 549 (61.6%) zeros Zeros
sibsp has 608 (68.2%) zeros Zeros
parch has 678 (76.1%) zeros Zeros
fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2022-05-19 06:12:08.434654
Analysis finished2022-05-19 06:12:22.408213
Duration13.97 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean445
Minimum0
Maximum890
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:22.559486image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile44.5
Q1222.5
median445
Q3667.5
95-th percentile845.5
Maximum890
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5783232405
Kurtosis-1.2
Mean445
Median Absolute Deviation (MAD)223
Skewness0
Sum396495
Variance66231
MonotonicityStrictly increasing
2022-05-19T15:12:22.734406image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
5981
 
0.1%
5871
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
5941
 
0.1%
Other values (881)881
98.9%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
ValueCountFrequency (%)
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%
8811
0.1%

survived
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3838383838
Minimum0
Maximum1
Zeros549
Zeros (%)61.6%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:22.868301image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.4865924543
Coefficient of variation (CV)1.267701394
Kurtosis-1.775004671
Mean0.3838383838
Median Absolute Deviation (MAD)0
Skewness0.4785234383
Sum342
Variance0.2367722165
MonotonicityNot monotonic
2022-05-19T15:12:22.988041image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=2)
ValueCountFrequency (%)
0549
61.6%
1342
38.4%
ValueCountFrequency (%)
0549
61.6%
1342
38.4%
ValueCountFrequency (%)
1342
38.4%
0549
61.6%

pclass
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.308641975
Minimum1
Maximum3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:23.101096image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q33
95-th percentile3
Maximum3
Range2
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.836071241
Coefficient of variation (CV)0.3621485054
Kurtosis-1.280014972
Mean2.308641975
Median Absolute Deviation (MAD)0
Skewness-0.6305479069
Sum2057
Variance0.69901512
MonotonicityNot monotonic
2022-05-19T15:12:23.223671image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=3)
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%
ValueCountFrequency (%)
1216
24.2%
2184
 
20.7%
3491
55.1%
ValueCountFrequency (%)
3491
55.1%
2184
 
20.7%
1216
24.2%

sex
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577 
female
314 

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Characters and Unicode

Total characters4192
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Length

2022-05-19T15:12:23.368289image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-19T15:12:23.530128image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Most occurring characters

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4192
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin4192
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

age
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct88
Distinct (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.69911765
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:23.674871image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
MonotonicityNot monotonic
2022-05-19T15:12:23.862663image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
2825
 
2.8%
3025
 
2.8%
1925
 
2.8%
2124
 
2.7%
2523
 
2.6%
3622
 
2.5%
2920
 
2.2%
Other values (78)467
52.4%
(Missing)177
 
19.9%
ValueCountFrequency (%)
0.421
 
0.1%
0.671
 
0.1%
0.752
 
0.2%
0.832
 
0.2%
0.921
 
0.1%
17
0.8%
210
1.1%
36
0.7%
410
1.1%
54
 
0.4%
ValueCountFrequency (%)
801
 
0.1%
741
 
0.1%
712
0.2%
70.51
 
0.1%
702
0.2%
661
 
0.1%
653
0.3%
642
0.2%
632
0.2%
624
0.4%

sibsp
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:23.996803image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
MonotonicityNot monotonic
2022-05-19T15:12:24.122774image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
55
 
0.6%
87
 
0.8%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
 
2.0%
316
 
1.8%
228
 
3.1%
1209
 
23.5%
0608
68.2%

parch
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3815937149
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:24.247454image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
MonotonicityNot monotonic
2022-05-19T15:12:24.363891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.4%
55
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
 
9.0%
1118
 
13.2%
0678
76.1%

fare
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.20420797
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-05-19T15:12:24.515105image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
MonotonicityNot monotonic
2022-05-19T15:12:24.698997image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
7.229215
 
1.7%
015
 
1.7%
Other values (238)615
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
6.451
 
0.1%
6.49582
 
0.2%
6.752
 
0.2%
6.85831
 
0.1%
6.951
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%
221.77921
 
0.1%
211.51
 
0.1%
211.33753
0.3%
164.86672
0.2%
153.46253
0.3%

embarked
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
S
644 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S644
72.3%
C168
 
18.9%
Q77
 
8.6%
(Missing)2
 
0.2%

Length

2022-05-19T15:12:24.842430image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-19T15:12:24.977157image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
s644
72.4%
c168
 
18.9%
q77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter889
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin889
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

class
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Third
491 
First
216 
Second
184 

Length

Max length6
Median length5
Mean length5.20650954
Min length5

Characters and Unicode

Total characters4639
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThird
2nd rowFirst
3rd rowThird
4th rowFirst
5th rowThird

Common Values

ValueCountFrequency (%)
Third491
55.1%
First216
24.2%
Second184
 
20.7%

Length

2022-05-19T15:12:25.103698image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-19T15:12:25.243565image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
third491
55.1%
first216
24.2%
second184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3748
80.8%
Uppercase Letter891
 
19.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i707
18.9%
r707
18.9%
d675
18.0%
h491
13.1%
s216
 
5.8%
t216
 
5.8%
e184
 
4.9%
c184
 
4.9%
o184
 
4.9%
n184
 
4.9%
Uppercase Letter
ValueCountFrequency (%)
T491
55.1%
F216
24.2%
S184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Latin4639
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4639
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

who
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
man
537 
woman
271 
child
83 

Length

Max length5
Median length3
Mean length3.794612795
Min length3

Characters and Unicode

Total characters3381
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowman
2nd rowwoman
3rd rowwoman
4th rowwoman
5th rowman

Common Values

ValueCountFrequency (%)
man537
60.3%
woman271
30.4%
child83
 
9.3%

Length

2022-05-19T15:12:25.382690image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-19T15:12:25.536588image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
man537
60.3%
woman271
30.4%
child83
 
9.3%

Most occurring characters

ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3381
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring scripts

ValueCountFrequency (%)
Latin3381
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3381
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

adult_male
Boolean

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1019.0 B
True
537 
False
354 
ValueCountFrequency (%)
True537
60.3%
False354
39.7%
2022-05-19T15:12:25.666008image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

deck
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)3.4%
Missing688
Missing (%)77.2%
Memory size7.1 KiB
C
59 
B
47 
D
33 
E
32 
A
15 
Other values (2)
17 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters203
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowE
4th rowG
5th rowC

Common Values

ValueCountFrequency (%)
C59
 
6.6%
B47
 
5.3%
D33
 
3.7%
E32
 
3.6%
A15
 
1.7%
F13
 
1.5%
G4
 
0.4%
(Missing)688
77.2%

Length

2022-05-19T15:12:25.780281image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-19T15:12:25.931967image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
c59
29.1%
b47
23.2%
d33
16.3%
e32
15.8%
a15
 
7.4%
f13
 
6.4%
g4
 
2.0%

Most occurring characters

ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter203
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin203
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII203
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

embark_town
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
Southampton
644 
Cherbourg
168 
Queenstown
77 

Length

Max length11
Median length11
Mean length10.53543307
Min length9

Characters and Unicode

Total characters9366
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSouthampton
2nd rowCherbourg
3rd rowSouthampton
4th rowSouthampton
5th rowSouthampton

Common Values

ValueCountFrequency (%)
Southampton644
72.3%
Cherbourg168
 
18.9%
Queenstown77
 
8.6%
(Missing)2
 
0.2%

Length

2022-05-19T15:12:26.073062image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-19T15:12:26.221395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
southampton644
72.4%
cherbourg168
 
18.9%
queenstown77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
p644
6.9%
S644
6.9%
m644
6.9%
a644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8477
90.5%
Uppercase Letter889
 
9.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o1533
18.1%
t1365
16.1%
u889
10.5%
h812
9.6%
n798
9.4%
p644
7.6%
m644
7.6%
a644
7.6%
r336
 
4.0%
e322
 
3.8%
Other values (4)490
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin9366
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
p644
6.9%
S644
6.9%
m644
6.9%
a644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9366
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
p644
6.9%
S644
6.9%
m644
6.9%
a644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

alive
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1019.0 B
False
549 
True
342 
ValueCountFrequency (%)
False549
61.6%
True342
38.4%
2022-05-19T15:12:26.350354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

alone
Boolean

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1019.0 B
True
537 
False
354 
ValueCountFrequency (%)
True537
60.3%
False354
39.7%
2022-05-19T15:12:26.474858image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Interactions

2022-05-19T15:12:20.381812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:13.800877image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:15.070113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:16.324020image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.296591image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.274769image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.358439image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:20.516833image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:13.965876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:15.239911image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:16.465827image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.436389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.436354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.491589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:20.664649image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:14.295697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:15.387303image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:16.603337image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.573093image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.588192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.648561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:20.818969image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:14.462928image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:15.549951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:16.747614image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.720498image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.754049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.802835image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:20.948809image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:14.611484image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:15.700600image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:16.880066image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.867144image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.905168image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.942288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:21.096590image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:14.766730image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:15.853971image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.025436image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.002552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.064520image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:20.099303image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:21.435031image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:14.920124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:16.003029image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:17.161744image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:18.134209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:19.215045image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-19T15:12:20.244881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-05-19T15:12:26.580182image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-19T15:12:27.072545image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-19T15:12:27.267343image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-19T15:12:27.474832image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-19T15:12:27.687504image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-19T15:12:21.702097image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-19T15:12:21.947236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-19T15:12:22.166529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-19T15:12:22.306219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
0003male22.0107.2500SThirdmanTrueNaNSouthamptonnoFalse
1111female38.01071.2833CFirstwomanFalseCCherbourgyesFalse
2213female26.0007.9250SThirdwomanFalseNaNSouthamptonyesTrue
3311female35.01053.1000SFirstwomanFalseCSouthamptonyesFalse
4403male35.0008.0500SThirdmanTrueNaNSouthamptonnoTrue
5503maleNaN008.4583QThirdmanTrueNaNQueenstownnoTrue
6601male54.00051.8625SFirstmanTrueESouthamptonnoTrue
7703male2.03121.0750SThirdchildFalseNaNSouthamptonnoFalse
8813female27.00211.1333SThirdwomanFalseNaNSouthamptonyesFalse
9912female14.01030.0708CSecondchildFalseNaNCherbourgyesFalse

Last rows

Unnamed: 0survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
88188103male33.0007.8958SThirdmanTrueNaNSouthamptonnoTrue
88288203female22.00010.5167SThirdwomanFalseNaNSouthamptonnoTrue
88388302male28.00010.5000SSecondmanTrueNaNSouthamptonnoTrue
88488403male25.0007.0500SThirdmanTrueNaNSouthamptonnoTrue
88588503female39.00529.1250QThirdwomanFalseNaNQueenstownnoFalse
88688602male27.00013.0000SSecondmanTrueNaNSouthamptonnoTrue
88788711female19.00030.0000SFirstwomanFalseBSouthamptonyesTrue
88888803femaleNaN1223.4500SThirdwomanFalseNaNSouthamptonnoFalse
88988911male26.00030.0000CFirstmanTrueCCherbourgyesTrue
89089003male32.0007.7500QThirdmanTrueNaNQueenstownnoTrue